add FanOutMapper for one-to-many partition fan-out#66030
Draft
Lee-W wants to merge 17 commits intoapache:mainfrom
Draft
add FanOutMapper for one-to-many partition fan-out#66030Lee-W wants to merge 17 commits intoapache:mainfrom
Lee-W wants to merge 17 commits intoapache:mainfrom
Conversation
Member
Author
|
only the last commit matters. |
e4d0931 to
dd3408e
Compare
this is still not ideal, but at least it's not super wrong now
…t ordering
StartOfWeekMapper and StartOfQuarterMapper now derive their decode_downstream
regex from output_format itself, so users can re-order strftime directives
and {name} placeholders (e.g. "Q{quarter}/%Y") without having to override
decode_downstream. Malformed output_format — empty {}, non-identifier
placeholder names, duplicate %X directives, duplicate {name} placeholders —
raises ValueError at mapper construction instead of an opaque re.error from
deep inside a scheduler tick or UI route.
…ag_runs list Drop the SQL "count distinct assets with any log" subquery and always compute total_received via the Python rollup-aware helper. The list endpoint previously returned different numbers for the same APDR depending on whether the caller filtered by dag_id (rollup-aware, counts upstream window keys) or queried globally (SQL approximation, counts assets with any log) — same field, different semantics, very confusing for any UI consumer. The N+1 cost of per-Dag timetable loads was already paid in the global branch for total_required, so adding a single batched log fetch keeps the existing query budget while making the contract identical across both views. _compute_received_count now skips asset_ids that are no longer required (active=False) so the relaxed log query doesn't over-count.
StartOfWeekMapper now always uses ISO weeks (Monday) and StartOfMonthMapper always emits the 1st of the month. Custom fiscal boundaries can still be expressed by pairing a user-defined source mapper with the existing windows.
The next_run_assets and partitioned_dag_runs endpoints used to load and deserialize the full timetable on every request just to read mapper attributes (is_rollup) and required-key counts. Cache mapper metadata per asset on DagModel during Dag sync via a new ``partition_mapper_info`` JSON column, so the UI resolves mapper attributes from the cache and only loads the timetable when ``to_upstream`` evaluation for rollup mappers is actually needed.
Composes upstream_mapper + window + (optional) fine_mapper, symmetric to RollupMapper. New [scheduler] partition_fanout_max_keys caps the downstream keys per upstream event.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Why
Roll-up partitioning currently handles N→1 (many upstream keys feeding one downstream run), but the symmetric 1→N case — one weekly key fanning out into seven daily Dag runs — has no first-class mapper. Authors had to hand-roll fan-out logic per Dag, which made the unbounded-keys footgun easy to hit.
closes: #65654
What
Was generative AI tooling used to co-author this PR?
Generated-by: [Claude] following the guidelines
{pr_number}.significant.rst, in airflow-core/newsfragments. You can add this file in a follow-up commit after the PR is created so you know the PR number.